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Question 1 [4 marks] 


The design matrix and the vector of regression coefficients, corresponding to a model 


with one qualitative predictor variable with 3 treatment levels (A, B & C), are given below. 


(a) What does the second regression coefficient (32) represent? [1 Mark] 
(b) What is the estimate for the mean of treatment C? [2 Marks] 
(c) Which level is the baseline or reference level? [1 Mark] 

level x B 

A EO. 8G 4 

B 1 1 0 = 

G |ck Ae ab 5 
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Question 2 [8 marks] 


In an investigation of the interdependence of water uptake (water), food intake (food) 


and egg production (eggs), data were collected from a 10-day observation period of 12 


birds. The question of interest was “Does water consumption depend on food consumption 


&/or egg production?” R output from the analysis (Figure 1 and Tables 1, 2 & 3) is given 


on pp. 4 - 5. 


Four students examined the output and made the following statements. In each case 


state if the student is correct and justify your response with reference to the relevant 


output. 


(a) 


Student A: From Table 1 we see that the P-value for the regression coefficient of 
eggs is significant (P=0.0022), and the P-value for the regression coefficient of food 
is also significant (P= 4 x 10-8) hence both variables (food and eggs) should be 


included as predictor variables in the model. [2 Marks] 


Student B: The F-test for the term for eggs in Table 2 is highly significant 
(P= 7.8 x 107") as is the F-test for food (P= 1x10~°). Hence we should retain both 
terms in the final model. [2 Marks] 


Student C: Although there is an association between water consumption and egg 
production, once we have taken food intake into account, egg production is not 
providing any additional information about water consumption (Table 3: P = 0.333) 
and so should not be included in the model. [2 Marks] 


Student D: The coefficient for eggs is negative (Table 3, coefficient = -3.257), and 
so we can conclude that there is a negative association between water consumption 


and egg production. [2 Marks] 


Question 2 is continued on page 4 
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Figure 1: Pairs Plot 
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Coefficients: 


(Intercept) 


Table 1: 
Call: lm(formula = water ~ eggs) 
Coefficients: 
Estimate Std. Error t value Pr(>|t]) 
(Intercept) 153.4 24.5 6.26 9.4e-05 
eggs 20.4 5.0 4.08 0.0022 
Call: Im(formula = water ~ food) 


Estimate Std. Error t value Pr(>|t]) 
-63.513 20.671 -3.07 0.012 
2.733 0.185 14.78 4e-08 


Question 2 is continued on page 5 
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Question 2 continued 


Table 2: 


Analysis of Variance Table 


Response: water 


Df Sum Sq Mean Sq F value Pr (>F) 
eggs 1 42850 42850 143.4 7.8e-07 
food 1 23072 23072 77.2 1.0e-05 

Table 3: 


Call: lm(formula = water ~ eggs + food) 


Coefficients: 


Estimate Std. Error t value Pr(>|t]) 


(Intercept) -83.077 28.141 -2.95 0.016 
eggs -3.257 3.187 -1.02 0.333 
food 3.031 0.345 8.79 1.0e-05 
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Question 3 [8 marks] 
Since you’ve been doing so well in Statistical Modelling this trimester your statistics 


lecturer has asked you to help design a study to measure “UNE student satisfaction”. 


(a) Provide two well-defined response variables that would allow you to measure UNE 
student satisfaction. For each of your response variables you must also clearly define 
(as applicable) either the levels or units with which you would measure your response 
variable. At least one of your response variables should be categorical and contain at 


least 3 different levels. [4 Marks] 


(b) What are two explanatory variables that you would recommend be collected as part 
of the study? [1 Mark] 


(c) What type of sampling scheme would you use to promote an unbiased and represen- 
tative collection of 500 responses from UNE students for the study? Provide support 


for your answer. [2 Marks] 


(d) Briefly comment on one additional aspect of experimental design that you would 


suggest to be used as part of the study to ensure robust results. [1 Mark] 


Question 4 


Three different herbicides (A, B, C) were compared using a completely randomized 
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[8 marks] 


design with six plots per herbicide. Included in the design were six untreated plots (D). 


The number of weeds per plot were counted. 


(a) A one-way analysis of variance was performed, with count as the response variable. 


Residuals 


State the assumptions of the linear model and, with reference to the diagnostic plots 


(Figure 2 ), check the assumptions for this model. 


Residuals vs Fitted 


00 00 


000 00 


Fitted values 


Standardized residuals 


[2 Marks] 


Figure 2: Diagnostics for one-way analysis of variance 


(b) Refer to the Box-Cox plot (Figure 3 on p. 8). What transformation would you suggest 


so that the assumptions would not be violated? Choose from the identity, reciprocal, 


log or square root transformation. Justify your choice. 


[1 Mark] 


Question 4 is continued on page 8 
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Question 4 continued 


log-Lkelhood 
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Figure 3: Boxcox estimate 


(c) The data are analysed using the appropriate transformation of count (trcount). The 
estimates and 95% confidence intervals for the transformed means are presented in 
Table 4. 


(i) Verify that the mean weed count (in the original scale) for plots treated with 
herbicide B is 8.85. Show your calculations. [2 Marks] 


(ii) Give an informative interpretation, keeping in mind the aim of the experiment. 


[3 Marks] 


Table 4: 
lower mnfit upper 
A 0.745 1.11 1.48 
B 1.806 2.18 2.54 
C 2.680 3.05 3.42 
D 3.082 3.45 3.82 
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Question 5 [8 marks] 
In an experiment to compare the uptake of a chemical (g/day) from two dietary sup- 

plements, 14 animals were randomly divided into two groups of seven. The initial body 

weight (kg) was also recorded. A plot of the data is given in Figure 4 and relevant output 


appears in Tables 5 and 6. 
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Initial body weight 


Figure 4: 


(a) A stepwise regression was performed and the relevant R code and output is given in 


Table 5. Explain the process and output, and interpret the result. [3 marks] 
Table 5: 
## R code ## ## Output ## 
start .model<-lm(Uptake~Initial*Diet, Start: AIC=17.7 
data=supp.df) Uptake ~ Initial * Diet 
formL<-formula(~1) 
formU<-formula(~Initial*Diet ) Df Sum of Sq RSS AIC 
step.model<-step(start.model, trace=1, | <none> 28.0 17.7 
direction = "backward", - Initial:Diet 1 24.7 52.7 24.5 


scope=list (lower=formL, upper=formU) ) 


Question 5 is continued on page 10 
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(b) With reference to the table of regression coefficients and their 95% confidence inter- 


vals (Table 6), answer the following questions. 


(i) Does the relationship between Uptake and Initial body weight differ between 


diets? Justify your response. 


[2 marks] 


(ii) Write the relevant equation for the relationship between Uptake and Initial body 


weight for animals on Diet 2. 


[2 marks] 


(iii) Showing all calculations, determine the predicted uptake for animals on Diet 2 


with initial body weight of 13kg. 


Table 6: 


[1 Mark] 


> summary (step.model) 


Coefficients: 

Estimate Std.Error tvalue P(>|t]) 
(Intercept) -5 .37 4.59 -1.17 0.269 
Initial 1.56 0.34 4.57 0.001 
Diet2 10.63 7.06 1.51 0.163 
Initial:Diet2 1.57 0.53 2.97 0.014 


Residual standard error: 1.7 on 10 df 


Multiple R-squared: 0.992, Adj R-squared: 0.99 


F-statistic: 433 on 3 and 10 DF, p-value: 7e-1] 


> confint (step.model) 


Dhol eels 
(Intercept) -15.60 4.9 
Initial 0.80 2.3 
Diet2 -5.10 26.3 


Initial :Diet2 0.39 2.7 
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Question 6 [7 marks] 
The RMS Titanic was a British passenger liner that sank in the North Atlantic Ocean 
in the early morning of 15 April 1912, after colliding with an iceberg during her maiden 
voyage from Southampton to New York City. Of the 2,224 passengers and crew aboard, 
more than 1,500 died, making it one of the deadliest commercial peacetime maritime 
disasters in modern history. Explanatory variables that were used in the model were 
Pclass (passenger class: upper and lower), Sex (male and female), and Age (measured in 
years). Table 7 on p. 12 provides partial code and corresponding output from a GLM that 


was run on the Titanic dataset. Using this output answer the questions that follow. 


(a) What is the response variable for this model and which type of GLM was used? 
[1 Mark] 


(b) Is the model a good fit for the data? Justify your response. [1 Mark] 


(c) Use correct notation to write out the statistical model that was used in the analysis, 


defining all terms. [2 Marks] 


(d) Give an informative interpretation of the regression coefficients. [3 Marks] 
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Table 7: 


> model <- glm(Survived ~., family=binomial(link=’logit’), data=train) 


Coefficients: 

Estimate Std. Error z value Pr(>|zl) 
(Intercept) 5.137627 0.594998 8.635 < 2e-16 
Pclasslower -1.087156 0.151168 -7.192 6.40e-13 
Sexmale -2.756819 0.212026 -13.002 < 2e-16 
Age -0.037267 0.008195 -4.547 5.43e-06 


Analysis of Deviance Table 


Df Deviance Resid. Df Resid. Dev Pr(>Chi) 


NULL 799 1065.39 

Pclass 1 83.607 798 981.79 < 2.2e-16 
Sex 1 240.014 797 741.77 < 2.2e-16 
Age 1 17.495 796 724.28 2.881e-05 
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Question 7 [7 marks] 
In an experiment to test the efficiency of spyware programs to detect viruses the follow- 
ing number of threats were detected when boys aged 12 - 18 years old were left to surf the 
web for a period of 1 hour with four different programs. The first two programs developed 
were Attack and 2xAttack by the Phaser Company. However, two recent programs on the 
market (Cryptic and Destroy) appear to be more effective. In particular, Cryptic was just 
ranked the best spyware program of the year by ComputersAreCool magazine. The mean 


number of threats detected by each of the four programs are listed in Table 8. 


Table 8: 
Spyware | n | Mean number of 


threats detected 


Attack 5 20.5 
Cryptic | 5 37.4 
2xAttack | 5 19.5 
Destroy | 5 29.8 


You decide to run some statistical analyses to determine which program is the most 


effective. 


(a) Using the standard notation for a given contrast (e.g. c(1, -1,-1, 1)) and assuming that 
the four spyware programs are in the same order as in Table 8 give the notation for 


the following four contrasts: [2 Marks] 
(i) Comparing the spyware from the Phaser Company vs the two more recent pro- 
grams (C1). 
(ii) Comparing the two programs distributed by Phaser (C2). 
(iii) Comparing the two new programs against each other (C3). 


(iv) Comparing the efficiency of Cryptic as compared to the other programs tested 
(C4). 


Question 7 is continued on page 14 
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Question 7 continued 
(b) Are contrasts Cl and C4 orthogonal? How can you tell? [1 Mark] 


(c) R output for the ANOVA using contrasts Cl, C2 & C3 is given in Table 9. With 
reference to the output, verify the mean number of threats detected by the two newer 


programs (Cryptic and Destroy), as given in Table 8. Show your calculations. 


[3 Marks] 
Table 9: 
Estimate Std. Error 2.5 97.5 
(Intercept) 26.8 0.88 25.0 28.6 
C1 -6.8 0.88 -8.7 -5.0 
C2 0.5 1.24 -2.1 31 
C3 3.8 1.24 1.2 6.3 


(d) Briefly comment on one confounding factor that may have an impact on the results 
found in this study. [1 Mark] 


Please remember: This examination paper MUST BE HANDED IN. Failure to do so 
may result in the cancellation of all marks for this examination. Writing your name and 


number on the front will help us confirm that your paper has been returned. 
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